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Abstract. Despite the conventional wisdom that proactive security is 
superior to reactive security, we show that reactive security can be com- 
petitive with proactive security as long as the reactive defender learns 
from past attacks instead of myopically overreacting to the last attack. 
Our game-theoretic model follows common practice in the security lit- 
erature by making worst-case assumptions about the attacker: we grant 
the attacker complete knowledge of the defender's strategy and do not 
require the attacker to act rationally. In this model, we bound the com- 
petitive ratio between a reactive defense algorithm (which is inspired by 
online learning theory) and the best fixed proactive defense. Additionally, 
we show that, unlike proactive defenses, this reactive strategy is robust 
to a lack of information about the attacker's incentives and knowledge. 



1 Introduction 

Many enterprises employ a Chief Information Security Officer (CISO) to man- 
age the enterprise's information security risks. Typically, an enterprise has many 
more security vulnerabilities than it can realistically repair. Instead of declaring 
the enterprise "insecure" until every last vulnerability is plugged, CISOs typi- 
cally perform a cost-benefit analysis to identify which risks to address, but what 
constitutes an effective CISO strategy? The conventional wisdom [28-21] is that 
CISOs ought to adopt a "forward-looking" proactive approach to mitigating se- 
curity risk by examining the enterprise for vulnerabilities that might be exploited 
in the future. Advocates of proactive security often equate reactive security with 
myopic bug-chasing and consider it ineffective. We establish sufficient conditions 
for when reacting strategically to attacks is as effective in discouraging attackers. 

We study the efficacy of reactive strategies in an economic model of the 
CISO's security cost-benefit trade-offs. Unlike previously proposed economic 
models of security (see Section [7}, we do not assume the attacker acts according 
to a fixed probability distribution. Instead, we consider a game-theoretic model 
with a strategic attacker who responds to the defender's strategy. As is standard 
in the security literature, we make worst-case assumptions about the attacker. 
For example, we grant the attacker complete knowledge of the defender's strategy 
and do not require the attacker to act rationally. Further, we make conservative 
assumptions about the reactive defender's knowledge and do not assume the 
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defender knows all the vulnerabilities in the system or the attacker's incentives. 
However, we do assume that the defender can observe the attacker's past actions, 
for example via an intrusion detection system or user metrics [3]. 

In our model, we find that two properties are sufficient for a reactive strategy 
to perform as well as the best proactive strategies. First, no single attack is 
catastrophic, meaning the defender can survive a number of attacks. This is 
consistent with situations where intrusions (that, say, steal credit card numbers) 
are regrettable but not business-ending. Second, the defender's budget is liquid, 
meaning the defender can re-allocate resources without penalty. For example, a 
CISO can reassign members of the security team from managing firewall rules 
to improving database access controls at relatively low switching costs. 

Because our model abstracts many vulnerabilities into a single graph edge, we 
view the act of defense as increasing the attacker's cost for mounting an attack 
instead of preventing the attack (e.g., by patching a single bug). By making 
this assumption, we choose not to study the tactical patch-by-patch interaction 
of the attacker and defender. Instead, we model enterprise security at a more 
abstract level appropriate for the CISO. For example, the CISO might allocate a 
portion of his or her budget to engage a consultancy, such as WhiteHat or iSEC 
Partners, to find and fix cross-site scripting in a particular web application or 
to require that employees use SecurlD tokens during authentication. We make 
the technical assumption that attacker costs are linearly dependent on defense 
investments locally. This assumption does not reflect patch-by-patch interaction, 
which would be better represented by a step function (with the step placed at the 
cost to deploy the patch). Instead, this assumption reflects the CISO's higher- 
level viewpoint where the staircase of summed step functions fades into a slope. 

We evaluate the defender's strategy by measuring the attacker's cumulative 
return-on- investment, the return- on- attack (ROA), which has been proposed 
previously |8j. By studying this metric, we focus on defenders who seek to "cut 
off the attacker's oxygen," that is to reduce the attacker's incentives for attack- 
ing the enterprise. We do not distinguish between "successful" and "unsuccessful" 
attacks. Instead, we compare the payoff the attacker receives from his or her ne- 
farious deeds with the cost of performing said deeds. We imagine that sufficiently 
disincentivized attackers will seek alternatives, such as attacking a different or- 
ganization or starting a legitimate business. 

In our main result, we show sufficient conditions for a learning-based reactive 
strategy to be competitive with the best fixed proactive defense in the sense that 
the competitive ratio between the reactive ROA and the proactive ROA is at 
most 1 + e, for all e > 0, provided the game lasts sufficiently many rounds (at 
least J?(l/e)). To prove our theorems, we draw on techniques from the online 
learning literature. We extend these techniques to the case where the learner 
does not know all the game matrix rows a priori, letting us analyze situations 
where the defender does not know all the vulnerabilities in advance. Although 
our main results are in a graph-based model with a single attacker, our results 
generalize to a model based on Horn clauses with multiple attackers. Our results 
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Fig. 1.1. An attack graph representing an enterprise data center. 

are also robust to switching from ROA to attacker profit and to allowing the 
proactive defender to revise the defense allocation a fixed number of times. 

Although myopic bug chasing is most likely an ineffective reactive strategy, we 
find that in some situations a strategic reactive strategy is as effective as the opti- 
mal fixed proactive defense. In fact, we find that the natural strategy of gradually 
reinforcing attacked edges by shifting budget from unattacked edges "learns" the 
attacker's incentives and constructs an effective defense. Such a strategic reactive 
strategy is both easier to implement than a proactive strategy — because it does 
not presume that the defender knows the attacker's intent and capabilities — and 
is less wasteful than a proactive strategy because the defender does not expend 
budget on attacks that do not actually occur. Based on our results, we encourage 
CISOs to question the assumption that proactive risk management is inherently 
superior to reactive risk management. 

Organization. Section [2] formalizes our model. Section [3] shows that perimeter 
defense and defense-in-depth arise naturally in our model. Section [4] presents our 
main results bounding the competitive ratio of reactive versus proactive defense 
strategies. Section [5] outlines scenarios in which reactive security out-performs 
proactive security. Section [6] generalizes our results to Horn clauses and multiple 
attackers. Section [7] relates related work. Section [8] concludes. 

2 Formal Model 

In this section, we present a game-theoretic model of attack and defense. Unlike 
traditional bug-level attack graphs, our model is meant to capture a managerial 
perspective on enterprise security. The model is somewhat general in the sense 
that attack graphs can represent a number of concrete situations, including a 
network (see Figure , components in a complex software system [9] , or an 
Internet Fraud "Battlefield" [□]. 

System. We model a system using a directed graph (V,E), which defines the 
game between an attacker and a defender. Each vertex v G V in the graph 
represents a state of the system. Each edge e € E represents a state transition the 
attacker can induce. For example, a vertex might represent whether a particular 
machine in a network has been compromised by an attacker. An edge from one 
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machine to another might represent that an attacker who has compromised the 
first machine might be able to compromise the second machine because the two 
are connected by a network. Alternatively, the vertices might represent different 
components in a software system. An edge might represent that an attacker 
sending input to the first component can send input to the second. 

In attacking the system, the attacker selects a path in the graph that be- 
gins with a designated start vertex s. Our results hold in more general models 
(e.g., based on Horn clauses), but we defer discussing such generalizations until 
Section [6] We think of the attack as driving the system through the series of 
state transitions indicated by the edges included in the path. In the networking 
example in Figure [j"T| an attacker might first compromise a front-end server and 
then leverage the server's connectivity to the back-end database server to steal 
credit card numbers from the database. 

Incentives and Rewards. Attackers respond to incentives. For example, at- 
tackers compromise machines and form botnets because they make money from 
spam [20] or rent the botnet to others [32 . Other attackers steal credit card 
numbers because credit card numbers have monetary value |10] . We model the 
attacker's incentives by attaching a non-negative reward to each vertex. These 
rewards are the utility the attacker derives from driving the system into the state 
represented by the vertex. For example, compromising the database server might 
have a sizable reward because the database server contains easily monetizable 
credit card numbers. We assume the start vertex has zero reward, forcing the 
attacker to undertake some action before earning utility. Whenever the attacker 
mounts an attack, the attacker receives a payoff equal to the sum of the rewards 
of the vertices visited in the attack path: payoff(a) = J2 v ea reward(a). In the 
example from Figure [TTT] if an attacker compromises both a front-end server and 
the database server, the attacker receives both rewards. 

Attack Surface and Cost. The defender has a fixed defense budget B > 0, 
which the defender can divide among the edges in the graph according to a 
defense allocation d: for all e € E, d{e) > and ^2 eeE d(e) < B. 

The defender's allocation of budget to various edges corresponds to the de- 
cisions made by the Chief Information Security Officer (CISO) about where to 
allocate the enterprise's security resources. For example, the CISO might allo- 
cate organizational headcount to fuzzing enterprise web applications for XSS 
vulnerabilities. These kinds of investments are continuous in the sense that the 
CISO can allocate 1/4 of a full-time employee to worrying about XSS. We denote 
the set of feasible allocations of budget B on edge set E by T>b,e- 

By defending an edge, the defender makes it more difficult for the attacker 
to use that edge in an attack. Each unit of budget the defender allocates to an 
edge raises the cost that the attacker must pay to use that edge in an attack. 
Each edge has an attack surface [19 w that represents the difficulty in defending 
against that state transition. For example, a server that runs both Apache and 
Sendmail has a larger attack surface than one that runs only Apache because 
defending the first server is more difficult than the second. Formally, the attacker 
must pay the following cost to traverse the edge: cost(a, d) = J2eGa^( e )/ w ( e )- 
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Allocating defense budget to an edge does not "reduce" an edge's attack surface. 
For example, consider defending a hallway with bricks. The wider the hallway 
(the larger the attack surface), the more bricks (budget allocation) required to 
build a wall of a certain height (the cost to the attacker) . 

In this formulation, the function mapping the defender's budget allocation 
to attacker cost is linear, preventing the defender from ever fully defending an 
edge. Our use of a linear function reflects a level of abstraction more appropriate 
to a CISO who can never fully defend assets, which we justify by observing that 
the rate of vulnerability discovery in a particular piece of software is roughly 
constant [29]. At a lower level of detail, we might replace this function with a step 
function, indicating that the defender can "patch" a vulnerability by allocating 
a threshold amount of budget. 

Objective. To evaluate defense strategies, we measure the attacker's incentive 
for attacking using the return-on-attack (ROA) (8], which we define as follows: 

ROA(a,rf)= payQff(c ;; 

cost(a, a) 

We use this metric for evaluating defense strategy because we believe that if 
the defender lowers the ROA sufficiently, the attacker will be discouraged from 
attacking the system and will find other uses for his or her capital or industry. 
For example, the attacker might decide to attack another system. Analogous 
results hold if we quantify the attacker's incentives in terms of profit (e.g., with 
profit(a, d) = payoff(a) — cost(a, d)), but we focus on ROA for simplicity. 

A purely rational attacker will mount attacks that maximize ROA. However, 
a real attacker might not maximize ROA. For example, the attacker might not 
have complete knowledge of the system or its defense. We strengthen our results 
by considering all attacks, not just those that maximize ROA. 

Proactive Security. We evaluate our learning-based reactive approach by com- 
paring it against a proactive approach to risk management in which the defender 
carefully examines the system and constructs a defense in order to fend off future 
attacks. We strengthen this benchmark by providing the proactive defender com- 
plete knowledge about the system, but we require that the defender commit to a 
fixed strategy. To strengthen our results, we state our main result in terms of all 
such proactive defenders. In particular, this class of defenders includes the ratio- 
nal proactive defender who employs a defense allocation that minimizes the max- 
imum ROA the attacker can extract from the system: argmin d max a ROA(a, d). 



3 Case Studies 



In this section, we describe instances of our model to build the reader's intu- 
ition. These examples illustrate that some familiar security concepts, including 
perimeter defense and defense in depth, arise naturally as optimal defenses in our 
model. These defenses can be constructed either by rational proactive attackers 
or converged to by a learning-based reactive defense. 
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Fig. 3.1. Attack graph representing a simplified data center network. 



Perimeter Defense. Consider a system in which the attacker's reward is non- 
zero at exactly one vertex, t. For example, in a medical system, the attacker's 
reward for obtaining electronic medical records might well dominate the value of 
other attack targets such as employees' vacation calendars. In such a system, a 
rational attacker will select the minimum-cost path from the start vertex s to the 
valuable vertex t. The optimal defense limits the attacker's ROA by maximizing 
the cost of the minimum s-t path. The algorithm for constructing this defense 
is straightforward |7 : 

1. Let C be the minimum weight s-t cut in (V, E, w). 

2. Select the following defense: 

d{e)= [B W {e)/Z ife.C where Z = V«,(e) . 
otherwise ~L 

Notice that this algorithm constructs a perimeter defense: the defender allocates 
the entire defense budget to a single cut in the graph. Essentially, the defender 
spreads the defense budget over the attack surface of the cut. By choosing the 
minimum- weight cut, the defender is choosing to defend the smallest attack 
surface that separates the start vertex from the target vertex. Real defenders 
use similar perimeter defenses, for example, when they install a firewall at the 
boundary between their organization and the Internet because the network's 
perimeter is much smaller than its interior. 

Defense in Depth. Many experts in security practice recommend that defend- 
ers employ defense in depth. Defense in depth rises naturally in our model as an 
optimal defense for some systems. Consider, for example, the system depicted 
in Figure |3.1| This attack graph is a simplified version of the data center net- 
work depicted in Figure Although the attacker receives the largest reward 
for compromising the back-end database server, the attacker also receives some 
reward for compromising the front-end web server. Moreover, the front-end web 
server has a larger attack surface than the back-end database server because 
the front-end server exposes a more complex interface (an entire enterprise web 
application), whereas the database server exposes only a simple SQL interface. 
Allocating defense budget to the left-most edge represents trying to protect sen- 
sitive database information with a complex web application firewall instead of 
database access control lists (i.e., possible, but economically inefficient). 

The optimal defense against a rational attacker is to allocate half of the 
defense budget to the left-most edge and half of the budget to the right-most 
edge, limiting the attacker to a ROA of unity. Shifting the entire budget to 
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the right-most edge (i.e., defending only the database) is disastrous because the 
attacker will simply attack the front-end at zero cost, achieving an unbounded 
ROA. Shifting the entire budget to the left-most edge is also problematic because 
the attacker will attack the database (achieving an ROA of 5). 

4 Reactive Security 

To analyze reactive security, we model the attacker and defender as playing 
an iterative game, alternating moves. First, the defender selects a defense, and 
then the attacker selects an attack. We present a learning-based reactive defense 
strategy that is oblivious to vertex rewards and to edges that have not yet been 
used in attacks. We prove a theorem bounding the competitive ratio between 
this reactive strategy and the best proactive defense via a series of reductions 
to results from the online learning theory literature. Other applications of this 
literature include managing stock portfolios [26 , playing zero-sum games |12] . 
and boosting other machine learning heuristics jTT]. Although we provide a few 
technical extensions, our main contribution comes from applying results from 
online learning to risk management. 

Repeated Game. We formalize the repeated game between the defender and 
the attacker as follows. In each round t from 1 to T; 

1. The defender chooses defense allocation dt(e) over the edges e € E. 

2. The attacker chooses an attack path at in G. 

3. The path at and attack surfaces {w(e) : e <E at} are revealed to the defender. 

4. The attacker pays cost(at,c^) and gains payoff(at). 

In each round, we let the attacker choose the attack path after the defender 
commits to the defense allocation because the defender's budget allocation is not 
a secret (in the sense of a cryptographic key) . Following the "no security through 
obscurity" principle, we make the conservative assumption that the attacker can 
accurately determine the defender's budget allocation. 

Defender Knowledge. Unlike proactive defenders, reactive defenders do not 
know all of the vulnerabilities that exist in the system in advance. (If defend- 
ers had complete knowledge of vulnerabilities, conferences such as Black Hat 
Briefings would serve little purpose.) Instead, we reveal an edge (and its attack 
surface) to the defender after the attacker uses the edge in an attack. For exam- 
ple, the defender might monitor the system and learn how the attacker attacked 
the system by doing a post-mortem analysis of intrusion logs. Formally, we define 
a reactive defense strategy to be a function from attack sequences {a^} and the 
subsystem induced by the edges contained in (J. ai to defense allocations such 
that d(e) = if edge e ^ | v J i a^. Notice that this requires the defender's strategy 
to be oblivious to the system beyond the edges used by the attacker. 
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Algorithm 1 A reactive defense strategy for hidden edges. 

- Initialize Eo = 

- For each round t £ {2, T} 

• Let E t -i = E t -2 U E(at-i) 

• For each e 6 Et-i, let 



St-i(e) = 



S , t_ 2 (e) + M(e,a t _i) if e 6 £ t _ 2 
M(e,Ot-i) otherwise. 



Ee' £Bt -Pt(e') 



where M(e, a) = —1 [e £ a] /w(e) is a matrix with rows and a column for 
each attack. 



Algorithm. Algorithm [T] is a reactive defense strategy based on the multiplica- 
tive update learning algorithm |5|12j . The algorithm reinforces edges on the 
attack path multiplicatively, taking the attack surface into account by allocat- 
ing more budget to easier-to-defend edges. When new edges are revealed, the 
algorithm re-allocates budget uniformly from the already-revealed edges to the 
newly revealed edges. We state the algorithm in terms of a normalized defense 
allocation P*(e) = dt(e)/B. Notice that this algorithm is oblivious to unattacked 
edges and the attacker's reward for visiting each vertex. An appropriate setting 
for the algorithm parameters (3 t £ [0, 1) will be described below. 

The algorithm begins without any knowledge of the graph whatsoever, and 
so allocates no defense budget to the system. Upon the t th attack on the system, 
the algorithm updates E t to be the set of edges revealed up to this point, and 
updates St(e) to be a weight count of the number of times e has been used in an 
attack thus far. For each edge that has ever been revealed, the defense allocation 
Pt-fi(e) is chosen to be /jf*^ normalized to sum to unity over all edges e £ E t . In 
this way, any edge attacked in round t will have its defense allocation reinforced. 

The parameter (3 controls how aggressively the defender reallocates defense 
budget to recently attacked edges. If (3 is infinitesimal, the defender will move 
the entire defense budget to the edge on the most recent attack path with the 
smallest attack surface. If j3 is enormous, the defender will not be very agile and, 
instead, leave the defense budget in the initial allocation. For an appropriate 
value of (3, the algorithm will converge to the optimal defense strategy. For 
instance, the min cut in the example from Section |3j 

Theorems. To compare this reactive defense strategy to all proactive defense 
strategies, we use the notion of regret from online learning theory. The following 
is an additive regret bound relating the attacker's profit under reactive and 
proactive defense strategies. 
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Theorem 1 The average attacker profit against Algorithm [JJ converges to the 
average attacker profit against the best proactive defense. Formally, if defense 
allocations {d t }J =1 are output by Algorithm [7] with parameter sequence [3 S = 

^1 + y / 21og \E s \/(s + 1)) on any system (V, E, w, reward, s) revealed online 

and any attack sequence {at}J = i, then 



T „ T 



- J] Profit(a t , d() - - ^ profit(a t) cf) < B\l -^1 + 1 fel ^ ^ 

4=1 t=l 



/or fflZZ proactive defense strategies d* G T>b,e where w 1 = |25| 1 X^eG-E w ( e ) > 
i/ie mean of the surface reciprocals. 

Remark 2 We can interpret Theorem [JJ as establishing sufficient conditions 
under which a reactive defense strategy is within an additive constant of the best 
proactive defense strategy. Instead of carefully analyzing the system to construct 
the best proactive defense, the defender need only react to attacks in a principled 
manner to achieve almost the same quality of defense in terms of attacker profit. 

Reactive defense strategies can also be competitive with proactive defense strate- 
gies when we consider an attacker motivated by return on attack (ROA). The 
ROA formulation is appealing because (unlike with profit) the objective function 
does not require measuring attacker cost and defender budget in the same units. 
The next result considers the competitive ratio between the ROA for a reactive 
defense strategy and the ROA for the best proactive defense strategy. 

Theorem 3 The ROA against Algorithm^ converges to the ROA against best 
proactive defense. Formally, consider the cumulative ROA: 

2^ t=1 cost{a t ,dt) 

(We abuse notation slightly and use singleton arguments to represent the cor- 
responding constant sequence.) If defense allocations {dt}J =1 are output by Al- 
gorithm jij with parameters j3 s = ^1 + -y/21og \E s \/(s + 1)) on any system 
(V, E, w, reward, s) revealed online, such that \E\ > X, and any attack sequence 
{ a t}t=i, then for all a > and proactive defense strategies d* £ T>b,e 

ROA ({a t }f =1 , {d t }f =1 ) 

ROA({a t }f =1 ,d*) ~ +a ' 

provided T is sufficiently large^\ 

Remark 4 Notice that the reactive defender can use the same algorithm re- 
gardless of whether the attacker is motivated by profit or by ROA. As discussed 
in Section^the optimal proactive defense is not similarly robust. 



1 To wit: T > (i| (1 + a- 1 ) (E eSinc(s) 10(e))) 3 log \E\. 
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We present proofs of these theorems in Appendix [X] We first prove the theorems 
in the simpler setting where the defender knows the entire graph. Second, we 
remove the hypothesis that the defender knows the edges in advance. 

Lower Bounds. In Appendix [A] we use a two- vertex, two-edge graph to estab- 
lish a lower bound on the competitive ratio of the ROA for all reactive strategies. 
The lower bound shows that the analysis of Algorithm [l] is tight and that Algo- 
rithm [l] is optimal given the information available to the algorithm. The proof 
gives an example where the best proactive defense (slightly) out-performs every 
reactive strategy, suggesting the benchmark is not unreasonably weak. 

5 Advantages of Reactivity 

In this section, we examine some situations in which a reactive defender out- 
performs a proactive defender. Proactive defenses hinge on the defender's model 
of the attacker's incentives. If the defender's model is inaccurate, the defender 
will construct a proactive defense that is far from optimal. By contrast, a reactive 
defender need not reason about the attacker's incentives directly. Instead, the 
reactive defender learns these incentives by observing the attacker in action. 

Learning Rewards. One way to model inaccuracies in the defender's estimates 
of the attacker's incentives is to hide the attacker's rewards from the defender. 
Without knowledge of the payoffs, a proactive defender has difficulty limiting the 
attacker's ROA. Consider, for example, the star system whose edges have equal 
attack surfaces, as depicted in Figure |5.1| Without knowledge of the attacker's 
rewards, a proactive defender has little choice but to allocate the defense budget 
equally to each edge (because the edges are indistinguishable). However, if the 
attacker's reward is concentrated at a single vertex, the competitive ratio for 
attacker's ROA (compared to the rational proactive defense) is the number of 
leaf vertices. (We can, of course, make the ratio worse by adding more vertices.) 
By contrast, the reactive algorithm we analyze in Section [4]is competitive with 
the rational proactive defense because the reactive algorithm effectively learns 
the rewards by observing which attacks the attacker chooses. 

Robustness to Objective. Another way to model inaccuracies in the de- 
fender's estimates of the attacker's incentives is to assume the defender mis- 
takes which of profit and ROA actually matter to the attacker. The defense 
constructed by a rational proactive defender depends crucially on whether the 
attacker's actual incentives are based on profit or based on ROA, whereas the 
reactive algorithm we analyze in Section [4] is robust to this variation. In partic- 
ular, consider the system depicted in Figure pT2) and assume the defender has a 
budget of 9. If the defender believes the attacker is motivated by profit, the ratio- 
nal proactive defense is to allocate the entire defense budget to the right-most 
edge (making the profit 1 on both edges). However, this defense is disastrous 
when viewed in terms of ROA because the ROA for the left edge is infinite (as 
opposed to near unity when the proactive defender optimizes for ROA). 
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Fig. 5.1. Star-shaped attack 
graph with rewards concen- 
trated in an unknown vertex. 



Fig. 5.2. An attack graph that sepa- 
rates the minimax strategies optimizing 
ROA and attacker profit. 



Catachresis. The defense constructed by the rational proactive defender is op- 
timized for a rational attacker. If the attacker is not perfectly rational, there is 
room for out-performing the rational proactive defense. There are a number of 
situations in which the attacker might not mount "optimal" attacks: 

— The attacker might not have complete knowledge of the attack graph. Con- 
sider, for example, a software vendor who discovers five equally severe vulner- 
abilities in one of their products via fuzzing. According to proactive security, 
the defender ought to dedicate equal resources to repairing these five vul- 
nerabilities. However, a reactive defender might dedicate more resources to 
fixing a vulnerability actually exploited by attackers in the wild. We can 
model these situations by making the attacker oblivious to some edges. 

— The attacker might not have complete knowledge of the defense allocation. 
For example, an attacker attempting to invade a corporate network might 
target computers in human resources without realizing that attacking the 
customer relationship management database in sales has a higher return-on- 
attack because the database is lightly defended. 

By observing attacks, the reactive strategy learns a defense tuned for the actual 
attacker, causing the attacker to receive a lower ROA. 



6 Generalizations 



Horn Clauses. Thus far, we have presented our results using a graph-based 
system model. Our results extend, however, to a more general system model 
based on Horn clauses. Datalog programs, which are based on Horn clauses, have 
been used in previous work to represent vulnerability- level attack graphs [27] ■ A 
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Horn clause is a statement in prepositional logic of the form p\ Ap2 A ■ ■ ■ Ap n — ► q. 
The propositions p%,P2, ■ • ■ ,p n are called the antecedents, and q is called the 
consequent. The set of antecedents might be empty, in which case the clause 
simply asserts the consequent. Notice that Horn clauses are negation-free. In 
some sense, a Horn clause represents an edge in a hypergraph where multiple 
pre-conditions are required before taking a certain state transition. 

In the Horn model, a system consists of a set of Horn clauses, an attack 
surface for each clause, and a reward for each proposition. The defender allocates 
defense budget among the Horn clauses. To mount an attack, the attacker selects 
a valid proof: an ordered list of rules such that each antecedent appears as a 
consequent of a rule earlier in the list. For a given proof 77, 

cost(77,d) = d (c)/w(e) payoff(TT) = ^ 

reward (p) , 

where [77] is the set of propositions proved by 7T (i.e., those propositions that 
appear as consequents in 77). Profit and ROA are computed as before. 

Our results generalize to this model directly. Essentially, we need only replace 
each instance of the word "edge" with "Horn clause" and "path" with "valid proof." 
For example, the rows of the matrix M used throughout the proof become the 
Horn clauses, and the columns become the valid proofs (which are numerous, 
but no matter). The entries of the matrix become M(c,II) = l/w(c), analogous 
to the graph case. The one non-obvious substitution is inc(s), which becomes 
the set of clauses that lack antecedents. 

Multiple Attackers. We have focused on a security game between a single 
attacker and a defender. In practice, a security system might be attacked by 
several uncoordinated attackers, each with different information and different 
objectives. Fortunately, we can show that a model with multiple attackers is 
mathematically equivalent to a model with a single attacker with a randomized 
strategy: Use the set of attacks, one per attacker, to define a distribution over 
edges where the probability of an edge is linearly proportional to the number 
of attacks which use the edge. This precludes the interpretation of an attack as 
an s-rooted path, but our proofs do not rely upon this interpretation and our 
results hold in such a model with appropriate modifications. 

Adaptive Proactive Defenders. A simple application of an online learning 
result [18], omitted due to space constraints, modifies our regret bounds for 
a proactive defender who re-allocates budget a fixed number of times. In this 
model, our results remain qualitatively the same. 

7 Related Work 

Anderson [1 and Varian [31 informally discuss (via anecdotes) how the design 
of information security must take incentives into account. August and Tunca [2] 
compare various ways to incentivize users to patch their systems in a setting 
where the users are more susceptible to attacks if their neighbors do not patch. 
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Gordon and Loeb [15] and Hausken [17] analyze the costs and benefits of 
security in an economic model (with non-strategic attackers) where the proba- 
bility of a successful exploit is a function of the defense investment. They use this 
model to compute the optimal level of investment. Varian [30] studies various 
(single-shot) security games and identifies how much agents invest in security at 
equilibrium. Grossklags [16] extends this model by letting agents self-insure. 

Miura et al. [24 study externalities that appear due to users having the 
same password across various websites and discuss pareto-improving security 
investments. Miura and Bambos [25 rank vulnerabilities according to a random- 
attacker model. Skybox and RedSeal offer practical systems that help enterprises 
prioritize vulnerabilities based on a random-attacker model. Kumar et al. [22 
investigate optimal security architectures for a multi-division enterprise, taking 
into account losses due to lack of availability and confidentiality. None of the 
above papers explicitly model a truly adversarial attacker. 

Fultz [Tjt] generalizes [16] by modeling attackers explicitly. Cavusoglu et al. [3] 
highlight the importance of using a game-theoretic model over a decision theo- 
retic model due to the presence of adversarial attackers. However, these models 
look at idealized settings that are not generically applicable. Lye and Wing [23 
study the Nash equilibrium of a single-shot game between an attacker and a de- 
fender that models a particular enterprise security scenario. Arguably this model 
is most similar to ours in terms of abstraction level. However, calculating the 
Nash equilibrium requires detailed knowledge of the adversary's incentives, which 
as discussed in the introduction, might not be readily available to the defender. 
Moreover, their game contains multiple equilibria, weakening their prescriptions. 

8 Conclusions 

Many security experts equate reactive security with myopic bug-chasing and ig- 
nore principled reactive strategies when they recommend adopting a proactive 
approach to risk management. In this paper, we establish sufficient conditions for 
a learning-based reactive strategy to be competitive with the best fixed proactive 
defense. Additionally, we show that reactive defenders can out-perform proac- 
tive defenders when the proactive defender defends against attacks that never 
actually occur. Although our model is an abstraction of the complex interplay 
between attackers and defenders, our results support the following practical ad- 
vice for CISOs making security investments: 

— Employ monitoring tools that let you detect and analyze attacks against your 
enterprise. These tools help focus your efforts on thwarting real attacks. 

— Make your security organization more agile. For example, build a rigorous 
testing lab that lets you roll out security patches quickly once you detect 
that attackers are exploiting these vulnerabilities. 

— When determining how to expend your security budget, avoid overreacting 
to the most recent attack. Instead, consider all previous attacks, but discount 
the importance of past attacks exponentially. 
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In some situations, proactive security can out-perform reactive security. For ex- 
ample, reactive approaches are ill-suited for defending against catastrophic at- 
tacks because there is no "next round" in which the defender can use information 
learned from the attack. We hope our results will lead to a productive discussion 
of the limitations of our model and the validity of our conclusions. 

Instead of assuming that proactive security is always superior to reactive 
security, we invite the reader to consider when a reactive approach might be 
appropriate. For the parts of an enterprise where the defender's budget is liquid 
and there are no catastrophic losses, a carefully constructed reactive strategy can 
be as effective as the best proactive defense in the worst case and significantly 
better in the best case. 
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Algorithm 2 Reactive defense strategy for known edges using the multiplicative 
update algorithm. 

- For each e G E, initialize Pi(e) = 1/|-E|. 

- For each round t £ {2, . . . , T} and e € E, let 

Pt(e)=P^ 1 (e)-f3 M(e ' at - l) /Z t 
where Z t = J2 ^t-i (e)/3 M(e '' a '- l) 

e'g-E 



A Proofs 

We now describe a series of reductions that establish the main results. First, 
we prove Theorem [l] in the simpler setting where the defender knows the entire 
graph. Second, we remove the hypothesis that the defender knows the edges is 
advance. Finally, we extend our results to ROA. 

Profit (Known Edges). Suppose that the reactive defender is granted full 
knowledge of the system (V, E, w, reward, s) from the outset. Specifically, the 
graph, attack surfaces, and rewards are all revealed to the defender prior to the 
first round. Algorithm [2] is a reactive defense strategy that makes use of this 
additional knowledge. 

Lemma 5 If defense allocations {d t }J =1 are output by Algorithm with pa- 



rameter (3 = 1 1 + y 21o &) E \ J on any system (V, E, w, reward, s) and attack 
sequence {at]J = i, then 

i £ profits d t ) - i £ profit^ , cT ) < bM^ + ^ , 

t=i t=i 

for all proactive defense strategies d* £ T>b,e- 

The lemma's proof is a reduction to the following regret bound from online 
learning |12l Corollary 4]. 

Theorem 6 If the multiplicative update algorithm (Algorithm^ is run with any 

game matrix M with elements in [0, 1], and parameter = (l + yjl log \E\/T^ , 
then 

Proof (of Lemma^. Due to the normalization by Z t , the sequence of defense 
allocations {Pt}f = i output by Algorithm [2] is invariant to adding a constant to 
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all elements of matrix M. Let M' be the matrix obtained by adding constant 
C to all entries of arbitrary game matrix M, and let sequences {Pt]J = \ and 
{P[}f =1 be obtained by running multiplicative update with matrix M and M' 
respectively. Then, for all e € E and t € [T — 1], 

p t+i(e) - 



p l(e )^E| =1 M(e,a < ) 

In particular Algorithm [2] produces the same defense allocation sequence as if 
the game matrix elements are increased by one to 



M'(e,a) 



1 — 1 /w(e) if e e a 
1 otherwise. 



Because this new matrix has entries in [0,1] we can apply Theorem [6] to prove 
for the original matrix M that 

i t M (P „ - ^ { 1 1 M(P.. „, j < + ^S.,A,, 

Now, by definition of the original game matrix, 



M(P t , a t ) = -(Pt(e)/w(e)) -l[e€a t ] = - £ P*(e)/«j(e) 

eE-E e£a t 

= -P.- 1 Y d t {e)/w{e) = - B- 1 cost(a t , d t ) 



Thus Inequality (A.ll is equivalent to 



1 T f 1 T } 

fZ2 B lcost ( a ti d t) - d /nin 1 cost(a tl d*)\ 



r 

log |^| , log\E 



2T T 



Simple algebraic manipulation yields 



1 T f 1 T 1 

- V profit(a t , d t ) - min < — V profit (a* , d*) } 

J t=i d 6Pb b 1 1 t=i J 

1 T f 1 T ) 

= ~ Y (payoff (a t ) - cost(a t , d t )) - min < — V] (payoff (a*) - cost (at, d*)) > 
J t=i d*ev B , E [1 t=1 J 

1 T f 1 T 1 

-^(-costffli.iii)) - min <^ - V(-cost(a t ,d*)) \ 

L t=i d eVB - E [t=i J 



T 



2T T 
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Profit (Hidden Edges). The standard algorithms in online learning assumes 
that the rows of the matrix are known in advance. Here, the edges are not 
known in advance and we must relax this assumption using a simulation ar- 
gument, which is perhaps the least obvious part of the reduction. The defense 
allocation chosen by Algorithm [l] at time t is precisely the same as the defense 
allocation that would have been chosen by Algorithm [2] had the defender run 
Algorithm [5] on the currently visible subgraph. The following lemma formalizes 
this equivalence. Note that Algorithm [l]s parameter is reactive: it corresponds 
to the Algorithm [2]s parameter, but for the subgraph induced by the edges re- 
vealed so far. That is, (3 t depends only on edges visible to the defender in round 
t, letting the defender actually run the algorithm. 

Lemma 7 Consider arbitrary round t € [T]. If Algorithms [7] and [£| are run 
with parameters j3 s = ( 1 + ^/21og \E s \/(s + 1)^ for s G [t] and parameter 

ft=(l + ^iTl) ) 1 respectively, with the latter run on the subgraph 
induced by E t , then the defense allocations P t+1 (e) output by the algorithms are 
identical for all e G E t . 

Proof. If e 6 E t then P t+1 (e) — 0^i=i M ( e < a >) because j3 t = (3, and the round 
t + 1 defense allocation of Algorithm [l] Pt+i is simply Pt+i normalized to sum 
to unity over edge set E t , which is exactly the defense allocation output by 
Algorithm [2] 

Armed with this correspondence, we show that Algorithm [T]is almost as effective 
as Algorithm [2] In other words, hiding unattacked edges from the defender does 
not cause much harm to the reactive defender's ability to disincentivize the 
attacker. 

Lemma 8 // defense allocations {d\ t tYt=i and {d2,t}t=i are output by Algo- 
rithms jij and^ with parameters f3 t = (l + s/2\og \E t \/(t + 1)) 1 for t G [T — 1] 

and (3 = \1 + y/2\og \E\/ (T)^ , respectively, on a system (V, E, w, reward, s) 
and attack sequence {a t }J = i, then 

T T 

B T 



i^profit(o tl di, t ) - profit (at, d 2 ,t) < T 
t=i t—i 

Proof. Consider attack at from a round t G [T] and consider an edge e G a t . 
If e G a s for some s < t, then the defense budget allocated to e at time t by 
Algorithm [2] cannot be greater than the budget allocated by Algorithm [l] Thus, 
the instantaneous cost paid by the attacker on e when Algorithm [l] defends is 
at least the cost paid when Algorithm [2] defends: di tt {e) /w(e) > d2.t(e)/w(e). 
If e ^ U*=i a s then for all s G [t], di, s (e) = 0, by definition. The sequence 
{d2, s (e)}*Z^ is decreasing and positive. Thus max s<t d2, 8 {e) — d\. s {e) is optimized 
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at s = 1 and is equal to BJ\E\. Finally because each edge e £ E is first revealed 
exactly once this leads to 



Vcost(a t ,d 2 ,t) - Vcost(a t ,d M ) = V V d2 -*( e ) ^Mil < V" g 

' ' w(e) \E\w(e) 



Combined with the fact that the attacker receives the same payout whether 
Algorithm [2] or Algorithm [l] defends completes the result. 

Proof (of Theorem^. The result follow immediately from Lemma[5]and Lemma[8j 

Finally, notice that Algorithm [T] enjoys the same time and space complexities 
as Algorithm [2] up to constants. 

ROA (Hidden Edges). We now translate our bounds on profit into bounds 
on ROA by observing that ratio of two quantities is small if the quantities are 
large and their difference is small. We consider the competitive ratio between 
an reactive defense strategy and the best proactive defense strategy after the 
following technical lemma, which asserts that the quantities are large. 

Lemma 9 For all attack sequences {at}f = i, max^gz^ E Y^t=i cost (at, d*) > 
VT where game value V is m8XdeT>B e mm a cost(a, d) = ^ — — - — > 0, where 
inc(w) C E denotes the edges incident to vertex v. 

Proof. Let d* = argmax dGl , B E min a cost(a, d) witness the game's value V, then 
max <i£ii B E St=i cost(a t , d) > J2t=i cost (a t , d*) > TV. Consider the defen- 
sive allocation for each e £ E. If e S inc(s), let d(e) = Bw(e) / Eeginc(s) w ( e ) > 0> 
and otherwise d(e) = 0. This allocation is feasible because 

its E.ancW^fe) 

By definition d(e)/w(e) = Bj EegincO) w ( e ) f° r each edge e incident to s. There- 
fore, cost(a, d) > Bj J2eeinc(s) w ( e ) f° r an Y non-trivial attack a, which necessar- 
ily includes at least one s-incident edge. Finally, V > min a cost(a, d) proves 

V > ^ ^— ^ • (A.2) 

Now, consider a defense allocation d and fix an attack a that minimizes the total 
attacker cost under d. At most one edge e £ a can have d(e) > 0, for otherwise the 
cost under d can be reduced by removing an edge from a. Moreover any attack 
a £ argmin eeinc ( s -, d(e)/w(e) minimizes attacker cost under d. Thus the maximin 
V is witnessed by defense allocations that maximize min e g i nc ( s ) d (e)/w(e). This 



maximization is achieved by allocation d and so Inequality (A.2 1 is an equality. 
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We are now ready to prove the main ROA theorem: 

Proof (of Theorem^. First, observe that for all B > and all A,Cel 

~ < C A — B < (C - \)B . (A.3) 

We will use this equivalence to convert the regret bound on profit to the desired 
bound on ROA. Together Theorem [l] and Lemma [9] imply 



a cost(a t , d t ) 
t=i 

>a max V] cost(a t , d*) - a — yjT\og \E\ - aB (log \E\ + vj- 1 

d*£V B ,E 2 V 

> aVT - a—y/Tlog\E\ - aB (log \E\ + w- 1 ) 
where V = maX(jgi) fl E min a cost (a, d) > 0. If 

VT>^(l + a- l )^]og\E\ E ' 



we can use inequalities V — -B/X} e einc(s) w ( e )' w 1 — 21og \E\ (since \E\ > 1), 
and (£eemc( s ) w ( e )) < 1 t0 show 



Vf > ((1 + a)B + VT(1 + + 24 a^] (1 + a)BJ (2y/2aV)~ 1 y /log \E\ , 
which combines with Theorem [l] and Inequality |A.4| to imply 

a E cost {at,d t )>aVT-a — v / f^og\E\ ~ aB (log | E \ + w~ 



t=i 



B 



y/Tlog\E\ + B(log\E\ 



> -Vrio- f | + 21 ( | () -|£'| + ,r 

T T 



> / y profit(at, dt) — min > ^ profit(a t , d*) 
t=i d * eI,B - E t=i 

T T 

= (— cost (at, dt)) — min (— cost(a t , d*)) 

— ^ d*£T>B,E — ^ 



T T 

max cost (at, d*) — 2. cost (a*, dt 
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Finally, combining this equation with Equivalence |A.3| yields the result 

ROA^I^R}^) 
mm d * &T>B E ROA ({a t }f =1 ,d*) 

_ EfLi payoff (a t ,d t ) X^tLi cost(a 4 , d*) 

J2t=i cost (a t , d t ) d*<£V B , E Y, t=1 payoff(a t , d*) 
_ ma,x d * eT>B E X^liCOst(a t ,ri*) 

Yj =1 cost(a t ,d t ) 
< l + a . 

B Lower Bounds 

We briefly argue the optimality of Algorithm [l] for a particular graph, i.e. we 
show that Algorithm [l] has optimal convergence time for small enough a, up 
to constants. (For very large a, Algorithm [I] converges in constant time, and 
therefore is optimal up to constants, vacuously.) The argument considers an at- 
tacker who randomly selects an attack path, rendering knowledge of past attacks 
useless. Consider a two- vertex graph where the start vertex s is connected to a 
vertex r (with reward 1) by two parallel edges ei and e<i, each with an attack 
surface of 1. Further suppose that the defense budget B = 1. We first show a 
lower bound on all reactive algorithms: 

Lemma 10 for all reactive algorithms A, the competitive ratio C is at least 
(x + fl(vT)) /x, i.e. at least (T + f2(y/T))/T because x < T. 

Proof. Consider the following random attack sequence: For each round, select an 
attack path uniform IID from the set {ei, e-{\. A reactive strategy must commit 
to a defense in every round without knowledge of the attack, and therefore every 
strategy that expends the entire budget of 1 inflicts an expected cost of 1/2 in 
every round. Thus, every reactive strategy inflicts a total expected cost of (at 
most) T/2, where the expectation is over the coin-tosses of the random attack 
process. 

Given an attack sequence, however, there exists a proactive defense allocation 
with better performance. We can think of the proactive defender being prescient 
as to which edge (ei or e^] will be attacked most frequently and allocating the 
entire defense budget to that edge. It is well-known (for instance via an analysis 
of a one-dimensional random walk) that in such a random process, one of the 
edges will occur Q{-^T) more often than the other, in expectation. 

By the probabilistic method, a property that is true in expectation must 
hold existentially, and, therefore, for every reactive strategy A, there exists an 
attack sequence such that A has a cost x, whereas the best proactive strategy 
(in retrospect) has a cost x + f2(VT). Because the payoff of each attack is 1, 
the total reward in either case is T. The prescient proactive defender, therefore, 
has an ROA of T/(x + f2(y/T)), but the reactive algorithm has an ROA of T/x, 
establishing the lemma. 
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Given this lemma, we show that Algorithm [T]is optimal given the information 
available. In this case, n — 2 and, ignoring constants from Theorem [3] we are 
trying to match a convergence time T is at most (1 + a -1 ) 2 , which is approxi- 
mately a~ 2 for small a. For large enough T, there exists a constant c such that 
C > (T+cVT)/T. By easy algebra, (T + cVT)/T > 1 + a whenever T < c 2 /a 2 7 
concluding the argument. 

We can generalize the above argument of optimality to n > 2 using the 
combinatorial Lemma 3.2.1 from |6]. Specifically, we can show that for every n, 
there is an n edge graph for which Algorithm [l] is optimal up to constants for 
small enough a. 



